Copula PC Algorithm for Causal Discovery from Mixed Data
نویسندگان
چکیده
We propose the ‘Copula PC’ algorithm for causal discovery from a combination of continuous and discrete data, assumed to be drawn from a Gaussian copula model. It is based on a two-step approach. The first step applies Gibbs sampling on rank-based data to obtain samples of correlation matrices. These are then translated into an average correlation matrix and an effective number of data points, which in the second step are input to the standard PC algorithm for causal discovery. A stable version naturally arises when rerunning the PC algorithm on different Gibbs samples. Our ‘Copula PC’ algorithm extends the ‘Rank PC’ algorithm, which has been designed for Gaussian copula models for purely continuous data. In simulations, ‘Copula PC’ indeed outperforms ‘Rank PC’ in cases with mixed variables, in particular for larger numbers of data points, at the expense of a slight increase in computation time.
منابع مشابه
Causal Discovery in Climate Science Using Graphical Models
We use the framework of probabilistic graphical models developed by Pearl [1] and by Spirtes et al. [2]. Specifically, we use algorithms for constraint-based structure learning, such as the PC algorithm developed by Spirtes and Glymour [3] and modifications thereof that deal with temporal data. The PC algorithm generates one or more graph representations that describe the potential causal pathw...
متن کاملA fast PC algorithm for high dimensional causal discovery with multi-core PCs
Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The PC algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the PC algorithm, in the worst-case, is exponential to the number of nodes (variables), and thus it is inefficient when being applied to high dimensional data, e.g....
متن کاملA Partial Correlation-Based Algorithm for Causal Structure Discovery with Continuous Variables
We present an algorithm for causal structure discovery suited in the presence of continuous variables. We test a version based on partial correlation that is able to recover the structure of a recursive linear equations model and compare it to the well-known PC algorithm on large networks. PC is generally outperformed in run time and number of
متن کاملBayesian Probabilities for Constraint-Based Causal Discovery
We target the problem of accuracy and robustness in causal inference from finite data sets. Our aim is to combine the inherent robustness of the Bayesian approach with the theoretical strength and clarity of constraint-based methods. We use a Bayesian score to obtain probability estimates on the input statements used in a constraint-based procedure. These are subsequently processed in decreasin...
متن کاملLearning Causal Models of Relational Domains
Methods for discovering causal knowledge from observational data have been a persistent topic of AI research for several decades. Essentially all of this work focuses on knowledge representations for propositional domains. In this paper, we present several key algorithmic and theoretical innovations that extend causal discovery to relational domains. We provide strong evidence that effective le...
متن کامل